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Abstract 

The approximation of tensors has important applications in various disciplines, but it remains an ex¬ 
tremely challenging task. It is well known that tensors of higher order can fail to have best low-rank ap¬ 
proximations, but with an important exception that best rank-one approximations always exists. The most 
popular approach to low-rank approximation is the alternating least squares (ALS) method. The conver¬ 
gence of the alternating least squares algorithm for the rank-one approximation problem is analysed in this 
paper. In our analysis we are focusing on the global convergence and the rate of convergence of the ALS 
algorithm. It is shown that the ALS method can converge sublinearly, Q-linearly, and even Q-superlinearly. 

Our theoretical results are illustrated on explicit examples. 
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method. 
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1 Introduction 


We consider a minimisation problem on the tensor space V = equipped with the Euclidean inner 

product (•,•). The objective function / : V —> 111 of the optimisation task is quadratic 


f(.v) 



- {v,v) - {b,v) 



( 1 ) 


where 6 € V. In our analysis, a tensor u € V is represented as a rank-one tensor. The representation of 
rank-one tensors is described by the following multilinear map U: 


d 

U : P:= X 

fi=i 


{Pi, ■ ■ ■ ,Pd) 


V 

d 

U{pi,...,Pd) ■■= <^Pt^- 
M=i 


We call a d-tuple of vectors (pi,... ,Pd) € P a representation system of u if u = U{pi,... ,Pd)- The 
tensor b is approximated with respect to rank-one tensors, i.e. we are looking for a representation system 
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(p^,... ,p^) G P such that for 


F 

F{pi,...,Pd) 


foU:P^V^R 



1 


{U{pi,... ,pd),U{pi,... ,pd)) - {b,U{pi,... ,pd)) 


( 2 ) 


we have 

F{p*i,...,p*d) = min F{pi,... ,pd). (3) 

(piv,Pd)eP 

The range set U{P) is a closed in V, see |!61. Therefore, the approximation problem is well defined. The set of 
best rank-one approximations of the tensor b is denoted by 


Adb := {u G [/(P) : u is a best rank-one approximation of 6} . (4) 

The alternating least squares (ALS) algorithm ||2l |3l IH |71 [H Hi] [121 is recursively defined. Suppose fhaf fhe 
k-th iterate p^ = {p\,... ,p^) and the first p — 1 components p\'^^■, ■ ■ ■ of the {k + l)-th iterate p^^^ 

have been determined. The basic step of the ALS algorithm is to compute the minimum norm solution 

pI+^ := argming^gRn^P(p^+\ ... ...,Pd)- 

Thus, in order to obtain from p^, we have to solve successively L ordinary least squares problems. 

The ALS algorithm is a nonlinear Gauss-Seidel method. The locale convergence of the nonlinear Gauss-Seidel 
method to a stationary point p* G P follows from the convergence of the linear Gauss-Seidel method applied 
to the Hessian F"{p*) at the limit point p*. If the linear Gauss-Seidel method converges R-linear then there 
exists a neighbourhood B{p*) of p* such that for every initial guess p^ G B{p*) the nonlinear Gauss-Seidel 
method converges R-linear with the same rate as the linear Gauss-Seidel method. We refer the reader to Ortega 
and Rheinboldt for a description of nonlinear Gauss-Seidel method lITOl Section 7.4] and convergence analysis 
lITOl Thm. 10.3.5, Thm. 10.3.4, and Thm. 10.1.3]. A representation system of a represented tensor is not 
unique, since the map U is multilinear. Consequently, the matrix F"{p*) is not positive definite. Therefore, 
convergence of the linear Gauss-Seidel method is in general not ensured. However, the convergence of the 
ALS method is discussed in li^ fTSlfTSlfT^ . Recently, the convergence of the ALS method was analysed by 
means of Lojasiewicz gradient inequality, please see iflAII for more details. The current analysis is not based 
on the mathematical techniques developed for the nonlinear Gauss-Seidel method neither on the theory of 
Lojasiewicz inequalities, but on the multilinearity of the map U. 

Notation 1.1 (INn)- The set 1N„ of natural numbers smaller than n G IN A denoted by 


INn := {j G M : 1 < j < n}. 


The precise analysis of the ALS method is a quite challenging task. Some of the difficulties of the theoretical 
understanding are explained in the following examples. 

Example 1.2. The approximation ofbGVbya tensor of rank one is considered, where 

r d 

b = Ai > • • • > Ar > 0, ||6j^|| = 1, (5) 

j=i /i=i 

'-V-" 

by.= 

Pm := {bjy-l<3<r)eR^-^^ (1 < A < c?), 
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and = Id, see the example in ^ Section 4.3.5], Let us further assume that Vk = Pi®P 2 ® ■■ - is 

already determined. Coroiiarv \2.4\ ieads to the recursion 




1 


d-l 


r^idiag I J] 

]i=2 


IIpmP 


B 


j=i,•••>>' 


p\ ik>2). 


( 6 ) 




The linear map Gi{pi,... ,p^) G describes the first micro step p\ 0 P 2 


'Pd ^ Pi 


k+l 


P 2 ® ... ® p^ in the ALS algorithm. The iteration matrix Gi{pi,... ,p^) is independent under rescaling of 
the representation system, i.e. Gi{aipi ,..., a^Pd) = Gi{pi,... ,pd) for 1 = 0 ) 1=1 Further, we can 
illustrate the difficulties of the ALS iteration in higher dimensions. For d = 2, the ALS method is given by the 
two power iterations 

. 

1 


= 


Pt 


p^+^ = 


-Sadiag ^^^ 2 ^ 


pi 


Clearly, if the global minimum bi is isolated, i.e. Ai > A 2 , then the ALS method converges to bi provided that 
{vq , 61 ) 7 ^ 0, where f 0 = p? ® P 2 ^ V is the initial guess. Further, we have linear convergence 

tanZ[6i^,p);+^] ^ tanZ[6i^,p)^] {l<p<2). 

Note that in this example the angle Z[ 6 i^,p);] is a more natural measure of the error than the usual distance 
\\bip “ P^ll- For d > 3, the factor 0 ) 1=2 {bjp^PpY Fq. (IH) describes the behaviour of the ALS 

iteration. Let 1 < j* < r. We say that a term bj* from Eq. (0 dominates at = Pi® ■ ■ .^p^if 

‘-ff. {b,.„p‘]f > ‘-ff (7) 

for all j G Nj* := {j G IN : 1 < j < r and j j*} and all /i G IN^. Ifbj* dominates at Vk, then the recursion 

formula 0 leads to 

maxjgiv,,, (^Xjllf,=l{bjp,pf,)) 


tan Z[6j* 


fc+ii 


< 


Aj* U.U\{bTp,Pp)) 


tanZ[bj*i,Pi] 


( 8 ) 


<1 

i.e. the first component of the representation system pi^^ is turned towards the direction ofbj* 1 . Note that for 
r = 2 the bound for the convergence rate is sharp, i.e. 

2 


tanZ[ 6 j*i,p^+^] 
The inequality 


maxjgiv,.* (Aj 0)1=2 Pm)) 
Aj* 0/i=2 {bj*p^P^)'^ 


tanZ[bj*i,Pi] (r = 2). 


(9) 




"P/A?* (6 


x4 TT {bjW^Pp) d-2[W /, 


p=2 

d-l 


> 


n ‘-ff {biPPiy = 

11=2 ''PG\ 
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shows that bj* also dominates at the successor Pi~^^ ® P 2 


) p^. Further, we have for all j G Nj- 


m =^2 


/ 


d-l 

n 


\ 




!i-2/\2 (r._^ ^kV 
Y 2 'j* \^3 UiPu/ 


K 


-tm{KuiAY 


< 




{br,FP\? 


<i 




By analogy for the following micro steps, we have 


(Aj n ^=2 ^ maxjgjv,.. (Aj 0^=2 {bju^pf,}) 

I ^ ! I o I I o 


0/^=2 (Aj* 0^1=2 i^ru^Pu)') 

Hence, the ALS iteration converges to bj^,. Now it is easy to see that 

/maxjgjv,,. {XjWlfYibji^^pY)) 


lim sup 

y (-^i* n^=2 


= 0 . 


Therefore, the tangent tanZ[5j»^,p^] converges Q-superlinearly, i.e. 


tanZ[ 6 j*^,p^ 


/c—>-oo 


> 0 [Q-superlinearly). 


Furthermore, the ALS iteration converges faster for large d. Unfortunately, there is no guarantee that the 
global minimum bi dominates at v^. However, in this example it is more likely that a chosen initial guess 
dominates at the global minimum. For simplicity let us assume that r = 2 and Ai > A 2 - see Eq. dJ]). Since the 
Tucker ranks of b are all equal to 2 and the condition from Eq. ([71) does not depend on the norm of the vectors 
from the representation system, assume without loss of generality that for p G the representation system of 
every initial guess has the following form: 


0 , 


vr 


2J 


PfiipY) = sin [pY) bfj,p + cos ((^^) b^p, G 

If the global minimum dominates at the initial guess, we have for all p G 

<(^tan(v9^) < 

If we define the angle ^ G [O, f ] such that 


\\PniTn)\\ = 1 


d- 2 /Ai 

V A 2 


then every initial guess with G [0, converges to the global minimum. Eurthermore, we have 
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Figure 1: The angle ^ describes the slice where the global minimum is a point of attraction. Every initial 
guess located under the red line id will converge to the global minimum. Note that the angle ^ is larger 
then j, but interestingly enough p*, - > j. 

^ ^ d^oo ^ 


i.e. the slice where the global minimum is a point of attraction is more potent then the slice where the local 
minimum A 262 is a point of attraction, see Figure\I\for illustration. But we have for the asymptotic behavior 




d^oo 




TT 

d -^00 4 


i.e. for sufficiently large d the slices are practically equal potent. 

Example 1.3. In the following example a sublinear convergence ofALS procedure for rank-one approximation 
is shown. We will consider the tensor b\ given by 

3 

b\ = p + \ {p ® q ® q + q ® p ® q q ® q ® p) 

for some A > 0 and p,q ^ II"' with ||p|| = ||( 7 || = 1 and {p, q) = 0. Let us first prove the following statement. 
Proposition 1.4. Define v* := (S)^^=iP- Then 

a) Mh = {n*}, if\ < \ 

b) 1-^61 = 2 and v* ^ A4b, if A > ^ 
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Proof. Let v\ € Mb- Since tensor b is symmetric, also has to be symmetric. Write v’^ = C\ 
where p\= p + a\q (this is possible, since {b,q^q^q) = 0). Now the tuple {C\px,p\,px) is a stationary 
point of F, therefore 

(Wr- ®Px^P\) b = Cpx 


for some C € R. But 


(Wru iSi Px ^ P\)'^ b = {I + \a\)p + 2\axq, 


hence 


The solutions of (flOl ) are 


2\ax 


ax 


0, ifA<i, 

O.V^or -^5^, ifA>l 


( 10 ) 


Straightforward calculations show that for A > ^ the solutions ax = =fcy ^ lead to the same value of F 
which is smaller than/(u*). ■ 


Now let \ < ^ and = C^Pi ®P 2 ®---®P% with = c^p + = 1 and some E R. 


Define 7 ^,^ := 


_ I 


S/_i,k 


. Applying Corollarv \2.4\ one gets after short calculations the recursion formula 


with some C\^k € Rand 


Then for fi fc := it holds 

’ ^l.k 




7i,fe+i — 

AT , = ( \ 

’ V Ac 2 ,A: / 


A(A + l)c 2 ,fcCi,fc^^ + A^ 


Sl,k 


^2,k + + ^)^'^2,feSl,fc Cl,fc 


( 11 ) 


Thanks to Corollarv \3.16\ and Proposition li.4l tvg know, that limfc^co = v* for v* = (^^(=1 p, therefore 


lim c,, i. = 1 

fc^oo 

lim s,, h = 0 

fe^oo 

for p E IN 3 . From Eq. \l2\and\rJ\one gets 

lim sup = A^ + A(A + 1) lim sup 

fc—>00 ^l,fc k-^-oo ^l,k 

The same way 


lim sup = A^ + A(A + 1) lim sup 

k^oo ^2,fc k^oo ^2,k 

lim sup = A^ + A(A + 1) lim sup 

fc —>-00 ^3,fc fc—>cx> 'S 3 ,A: 


( 12 ) 

(13) 

(14) 

(15) 

(16) 
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Furthermore, from Eq. (EB we know that 


P2,k+1 — C2,k+lM2,kPl,k+l 


with some C 2 ^k+i € 




Simple calculations result in the relation 


C 3 ,k 


S2,k+1 

■Sl,fc +1 


53 ^ 

A-^—ci^fc+i + Acs^fc, 

'Sl,fc+l 


and hence 

lim sup = A + A lim sup (17) 

k^oo ^l,k k—foo ^l,fc+l 

Now let A = ^. ^lim > 1, then from Eq. (fT4l) follows lim supk^oo conver¬ 

gence ofpi^k to p can not be Q-linearly. ^limsup^_j.oo < 1, then from Eq. ([TtI) limsup^j^go 

so from Eq. (ITSl ) limsup^j^o^ 

Remark 1.5. 

a) In fact for X = ^ it holds 


,. S2,k Ss k ,. Si fc+1 ^ 

lim sup = iim sup = lim sup —^ = 1. 

fc—100 Sl,k fc—>-oo S 2 jfc k^oo ^ 3 ,k 


b) Eor A < ^ ALS converges q-linearly with the convergence rate 

p = ^ (SA + A^ + v^(3A + A2)2 + 4 a) . 


c) The example can be extended to higher dimensions in the following way. Let 

d 

bx = (^p+ X'^ l(^q®p^ q 

fl=l fl=l \ Z/=l U=fl-\-l 

with IIpII = \\q\\ and {p,q) = 0. Then v* = (^^=iP is the unique best rank-one approximation ofb\ if 
and only if X < Eurthermore, ALS converges sublinearfor X = and Q-linearfor X < 1 ^^. 

Our new convergence results are not obtained by using conventional technics like for the analysis of nonlinear 
Gauss-Seidel method or the theory of Lojasiewicz inequalities. Therefore, a detailed convergence approach is 
necessary. 


2 The Alternating Least Squares Algorithm 

In the following section, we recall the ALS algorithm. Where the algorithmic description of the ALS method 
is given in Algorithm [T] 
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Algorithm 1 Alternating Least Squares (ALS) Algorithm 
1 : Set fc := 1 and choose an initial guess p^ = {p\-, ■ ■ ■ ,p\) ^ P,p^ ^ := p^, and vi := U{p^ 7 ^ 0. 
2 : while Stop Condition do 
3: Vk,o-=Vk 

4: for 1 < // < d do 

5: 


P 


k+l 


-k,tJ,+l 


pk+l 


k+l 

P/x-l 



2 "O' vgy 


Pi 


PfM-l 


Id 




iPl 5 • • • ) PfjL—1 jPfi jP/i+l: ■ ■ ■ iPl) 






^Lm+ 1 := ^(Pfc,^+i) 



(18) 


6 : end for 

Pk+l-=Pk,L^'^^^^+^'=^^Pk+l) 

8 : k ^ k + 1 

9: end while 


Notation 2.1 {L{A, B), Pu,fj)- Let A, B be two arbitrary vector spaces. The vector space of linear maps from 
A to B is denoted by 

L(A, B) := {M : A ^ B : M is linear} . 

Let p, y ^ INrf with u p. We define 

Piy^fj^ := R"'! X • • • X 1R"‘'-1 X IH^^'+I X • • • X lR"''-i x lR"'^+i x • • • x 

The following map from Lemma l2!2] is important for the analytical understanding of the ALS algorithm. 
As Corollary l2.4l shows. the map describes an micro step of the ALS algorithm. Furthermore, there is 

an interesting relation between the map and rank-one best approximations of the tensor b, see Theorem 

[ZH 

Lemma 2.2. Let /r, i/ G / p, and p^^^ = {pi,... ,py-i,pu+i, ■ ■ ■ ...,Pd) G Pu,iJ.- There 

exists a multilinear map : P^,^^ x V —>■ L(R”'‘', R"''*) such that 

= (pi (g).. .®py_i (g)... 0p^_i (8)IdianM ® ...®Pd) b (19) 

for all giy G R"'^'. Further, we have b) = b). 

Proof Follows directly form the multilinearity of the tensor product and elementary calculations. ■ 

Example 2.3. Let /r, z/ G / p, p^^^ = {pi,... ,py_i,p^+i,... ...,Pd) G and b be 

given in a subspace decomposition, i.e. 

ti td d 

^ = X] ■ ■ ■ X] 

h=l *d=l M=1 


8 


(V G ]N„^) 

























A matrix representation of the linear map is given by 

n {k.ivPi) 


*1=1 *I/=1 V=1 *d=l 

jT 


i&^d\{t^, *"} 


= B.rip^W,, 


■U, fl‘' 


where = ( 6 ^,i, • • •, /or all ^ € {p, o} and the entries of the matrix r(p^ are defined by 


ti ^u—1 '^u + 1 — l ^M+1 

[r(p,,^)](*.,v) = ^ ••• ••• ••• ••• Z] ••'Z%=.,*.) n ihivPd- 

*1=1 *y_i = l *i,+i=l *^_i=l *^+1=1 *d=l ^GlNd\{/i, **} 

Corollary 2.4. Lef p G k > 2, and Pj^ ^ • • • ^Pd) ^ P form Algorithm\I\ 

With the matrix from Lemma \T2\ the following recursion formula holds: 


A+l 


——- p% 


( 20 ) 


where 




r\ d 


:= n 

n 


U=1 



fi—2 

r\ d 

Gk, fi—i 

■= 

niN 


U=1 

U=fL 




fc+1 k 


Proof We have with Eq. (fT^ and Lemma [2^ 


^k+l 

r’fj, 


j,fc+i 

i^/i-i 


1 


Gk, /* 

1 


,... ,Pf,%Pf,+i, ...,Pd, b)P^-i^ 
T /' /C+I 


--■ ■ ■ ,p;%pUi, ■■■.Pd, b)pt.. 

e^fc, /*—1 


( 21 ) 

( 22 ) 


Example 2.5. Let Vk = p\® p\® ■ ■ ■ ® p^ and 

11 d 

b = Z ■■■ Z %=.,*.) ( 8 ) 

h=l *d=l A*=l 

i.e. the tensor b is given in the Tucker decomposition. From Eq. di 8 l) it follows 


= 


1 


ll td d 

T-rd II fcii^ Z Z n 

11^=2 Ik/^II ii=l id=l 


n 


1 


nJilAII iKit 

1 

njiTMl# 


^2 


^d -1 


ee'-m.e- e %. 

h=l *d=l *2=1 *d_i=l /*=2 


,fc|| '^A*d 


rf 

Pd 
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where 1 G BJ^B^ = and the entries of the matrix 

are defined by 


[ri,A:] 


il,id 


*2 ^d -1 d-1 

n 


*2=1 *d-l=l /*=2 


IIp^II 


(1 < Zi < ti, 1 < id < td). 


f/jaf Fi^fc is a diagonal matrix if the coefficient tensor j3 G (^^=i E-*'" is super- diagonal, see Eq. (| 6 l). For 
it follows further 


rfi - 

Pd — 


and finally 


n:=\ 


ti tfi d 1 

2 ■ ■ ■ y^ A*ivdd) ri ^d,id 


uW * 1=1 *£i=l 


k+l 

Pi = 


/*=! 

1 


rr'^-1 lUfc 


n 


/*=2 || 1 ^A* 




ni=i 


i_i=i iK/i 




Let u* = Xpi 0 ... ^ pd G TMfe be a rank-one best approximation of b. Without loss of generality we can 
assume that 

IIpiII = IIP2II = ■ ■ ■ = IlPdII = 1 and ||?;*|| = A. 

Further, let p,u ^ INd and 

Eu,fd (Pi, ■ ■ ■ ,Pu-i,Pu+i, ■ ■ ■ ,Pii-i,Pti+i, ■ ■ ■ ,Pd) e Pu,ii- 
The following two maps are of interest for our analysis: 

V : X ^ V 

{9u,gfi) ^ := Pi ® • • • ®Fi^-i ® ® • • • ®Pai-i ® ® 

and 


U : X 


V 


{9u,gii) ^ U{gy,gif) := {V{gu,g^f)ffi)V{gu,g^), 

where 5”“^ = {x G R"^ : ||x|| = 1} denotes the sphere in R"^. 

Lemma 2.6. Let ^ INd, gv £ and g^ G 5"'''“^. We have 

-2/(C/(5ii,5/i)) = gy,g^ = {U{gu,gfd),b) = \\U{gv,gfi)\[ 


Proof Let g^ G S'^'' ^ 
71 ( 5*0 5 / 1 ) ft and 

/ (ftft( 5 i/, 5 /i)) = 


, g^, G \ and define Tr{g,y,gf,) := V{g^, gf,){V{g^, g/,))'^ . It holds U{gy,gif) = 

^ (ftft( 5 ii, 5 /i),tft( 5 ii, 5 /i)) - {Uigu,gfi),b) = ^ (71^(511, 5 /i)ft, ft) - {'^{gu,gii)b,b) 

^ (71(5*05/1) ft, ft) - {'^{ 9 u,gii)b,b) = {Tr{g^,gi,)b,b) = {U{gy,g^)ffi) 

(71^(5*!, 5 /i)ft, ft) = (71(5*1, 5 /i)ft, 71(5*1, 5 /i)ft) = \\U{ 9 u,g^l)\\^ 

-^()^( 5 *i, 5 /i),ft)^ = -^((^**,/*(P^,^>ft)) 5 * 1 , 5 /i) • 


10 








Remark 2.7. Obviously, the minimisation problem from Eg. di]) is equivalent to the following constrained 
maximisation problem: Find v = p^ such that for all p € it holds 

{v,b) = max (v,b) subject to \\pa\\ = I- 

v&U{P) 


Lagrangian method for constrained optimisation leads to 


L\{qi, ■■■ ,qd) 


{U{qi, • • • , qd), 6 ) + ^ ^ (1 - ) , 

^l=l 


where q^ G R”'' and A = (Ai,--- , A^)^ G is the vector of Lagrange multipliers. A rank-one best 
approximation v* = Xpi ® ® pd ^ M.b with A G R and ||p^|| = 1 satisfies 

d j' 

—• • • ,pd) = {pi® ■ ■®p^+i® ■ ■ ■ ®pd) fe-A*j?^ = 0, 
■^Lx*ipi,--- ,Pd) = ^ (1-IIPmIP) =0- 

For V G \ {p} it follows that 


A = {pi® ■ ■ ■ ®Pd,b), Xp^i = My^^{p^^^,b)py, Xp^ = M^^^{p^^^,b)p^, 

where ^ £ Pi^,^ i^ like in Lemma \T2\ Therefore, X is a singular value of the matrix b) and p^, 

are the associated singular vectors. 

Proposition 2.8. Let v* = Xpi ® ■ ■ ■ ® pd & Mb a best approximation ofb with ||pi|| = • • • = ||p^|| = 1. We 
have 


fiv*) 


1 



1 


{b,v*). 


Proof. Since v* G Mb we have that v* = 116, where II := . Furthermore, it holds 

{vfv*) = (n6, V*) = {b, Hr;*) = (6, v*) . 

The rest follows from the definition of /, see Eq. ([Til. ■ 

Remark 2.9. From Proposition 12.81 it follows instantly that the global minimum of the best approximation 
problem from Eq. (jj]) has the largest norm among all other v & Mb- 

Theorem 2.10. Let p,o G IN^ and v* = ||n* ||pi 0 • • • 0 Pd ^ Mb be a rank-one best approximation of b with 
IIpiII = ■ ■ ■ = llPrfll = 1- Tben ||r;*|| is the largest singular value ofMiy^^{p^ b) andp^, p^ are the associated 
singular vectors. Furthermore, ifv* is isolated, then ||n*|| is a simple singular value of b). 

Proof. Let p, a ^ IN^. From Lemma IZ6] and Remark [277] it follows that ||r;*|| is a singular value of 
^,b) and Pu, Pfj. are associated singular vectors. Assume that there is a singular value A of 

^,b) and associated singular vectors q^ G R"’‘",g^ G R"''' with A > ||n*||. Let a G [0,1] 

and fi G (0,1] with = 1. Define further g^{a,l3) := := apu + fiqu € R”'" and 

g^ia,fi) := g^ := ap^ + G R’^'^. We have \\gfi\^ = Wg^^W"^ = = 1 and with Lemma 

I2.6l it follows then 

9v,9r) = (JyPIvAPy^^^') (^Pv + l^9u,ap^, + fiqi}l 
(a\\v*\\py + (3Xqu,Oip^ + fiq^ = (^a^]|n*]| +/3^A^ 

(a^lln*!! +A\\v*\\y = lln*]]^ = p,y,Pf,'^ = -2f{v*). 


-2f{U{g,,gA) = 

> 
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Consequently, it is 

f {U{gu{oi, /3), gfj,{a, /3))) < f{v*) for all a € [0,1] and /3 € (0,1] with + /3^ = 1, 

i.e. we can finde a better approximation tj{gy{a, ji), g^j_{a, ji)) of b which is arbitrary close to v*. This 
contradicts the fact that v* € 

Additionally, let v* be a isolated rank-one best approximation of b. Assume that there is a singular value A 
of ,b) and associated singular vectors Qi, G IR"'^', € R™'' with A = ||n*||, p^-Lq^, and 

Almost like above, let a, G [0,1] with + j3‘^ = 1 and consider again g,y{a,(3) = ap^ + j3qi, G R”*', 
< 7 ^(a, j3) = ap^ + fiq^ G With Lemma |T6] it follows 

-2f{U{g^,g^)) = {a^\\v*\\+/3'^xf = \\v*f ==-2f{v*), 

i.e. we have 


f{U{g^{a,P),gi,{a,P))) = f{v*) for all a, /3 G [0,1] wither^ + = l. 

Therefore, we can finde a approximation U{g^{a, P), gf^{a, P)) of b which is arbitrary close to v* and 
f{U{giy{a,P),g^{a,P))) = f{v*). This contradicts the fact that n* is isolated. ■ 

Remark 2.11, The proof of Theorem \2.10\ shows that if we have two different best approximations of 
b which differ only in two arbitrary components of the representation systems and f{v*) = f{v**), 
then there is a complete path between v* and v** described by U{gy{a,P),g^{a,P)) such that 

fiv*) = f {U{gu{a,P),g^{a,P))). 


3 Convergence Analysis 


In the following, we are using the notations and definitions from Section|2] Our convergence analysis is mainly 
based on the recursion introduced in Corollary I2.4l and the following Lemma l3T] 

Lemma 3.1. Let /c G IN, /r G IN, and = p\'^^ ® ® Ci C) • • • ® p^from AlgorithmU} Then 


Iffcj/i 


fc+l 


Pi [Pi ) 


Pi 


„fc+l 

P^l-l 




k-\-l 

PfZi 


I 0 


Pu+1 (Pr+iY 


Pu+i 



T 


is a orthogonal projection and 


^k,k.+l T ^k,k.^k,k.j 


where Vk^f, := b - Vk,^. 


Proof Obviously, is a orthogonal projection. Straightforward calculations show that Vk^^- ~ 

and Vk^k^-^i — Hence we have T — Vk^k‘-\-i' ® 

Lemma 3.2. Let /c G IN, p G IN/,. We have 

/(«.„) - /(»<=,„+.) = (23) 
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Proof. It follows with Lemma [SJ] that 




1 


2 “h ^k,ii^k,fi: ^k,^ “h ^k,^^k,^) '^k,^ “h ^k,^^k,^) 


- fiVk,f,) + -^ 

= f{Vk,ti) + 77^ 


{^k,fj,'f’k,fn^k,fi'k‘k,fi) “h {Vk,fi:^k,fi'k'k,fi) {b^^k,fj,'k'k,fi) 
{^k,in^k,fi'^k,fi) {'^k,fij^k,fi'^k,fi) 


N 1 {^k,ij,^k,^i '^k,fj,} 

f[Vk,fi) - 2 ij^jp 


i-e. f{vk,,) - f{vk,,+i) = 

Corollary 3.3. There exists a € R such that f{vk) 


k^oo 


a. 


Proof. Let A: G M and /i € From Lemmaand Lemma l3dl it follows that 

d 

fivk+i) - f{vk) = f{vk,d) - fivkfi) = - fiVk,ti-l) 

fl=l 

^ d—1 

^ ~ 2 || 6||2 ^ \\^k,ij.rk,ij.\\ < 0 , 

This shows that {f{vk))k&K C IR is a descending sequence. The sequence of function values {fivkfk&TN is 
bounded from below. Therefore, there exist an a G R such that f{vk) - a. ■ 

fc—>-oo 

Remark 3.4. From the definition of the ALS method it is already clear that {f {vk,^)) ^ descending 

sequence. 

Lemma 3.5. Let (r’fc,/i)fce]N,/ieiNd FV be the sequence from Algorithm\I\ We have 

f{vk,t,) = (24) 


for aZZ /c G M, // G M^. 

Proof. Let Zc G M and q G IN^. With Lemma [3T] it follows 

Rfcj/i—1(^) 6^ (f\.k,ki—ib^h') {vk,fnb) . 

The rest follows from the definition of /, see Eq. ([Til. ■ 

Corollary 3.6. Let {vk,i_i)k£K,iJ.£K^ C V be the sequence of represented tensors from the ALS algorithm. 
Further, let p, G IN^ and Zc G IN. The following statements are equivalent: 

(a) f{vk,ii+i) < fivk,ii) 

(b) > \\vk,i,\\‘^ 

(c) 

(d) cos‘^{(pk,ii+i) > cos‘^{Tk,ti), where cos^{Lpk,k) '■= 
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Proof. Follows direct from Lemma l3.5l and 


> \\vk,X > Gk,^\\plf 

where > 0 is defined in Corollary 12.41 ■ 

Lemma 3.7. Let {vk)k£M <ZV be the sequence of represented tensors from the ALS method. It holds 


||t>fc+i - Vk\\ -^ 0. 

/c—>-oo 

Proof. Let A: € IN. We have 
\\Vk+l-Vkf = 

Since — Vk,f_i = see Lemma ILTI it follows further with Eq. (|2^ and (1251) that 

d-l 


M =1 


< 


, ^t=l 


d-l 

<d'^\\vk,ti+i-Vk,f,f- (25) 

fj.=0 


\\vk+i-Vkf < 2d\\bf'^{f{vk,i,+i)-f{vk,k.))- 

fj.=0 


With Corollary Owe have ifivk,^,+i) - /(t’fc,,.))-^ 0, hence ||i;fc+i - Vk\\ -^ 0. ■ 

k^oo k^oo 

Definition 3.8 {A{vk), critical points). Let {vk)k£ft V be the sequence of represented tensors from Algo- 
rithmUl The set of accumulation points of {vk)k&t denoted by A{vk), i.e. 

A{vk) := {v € V : V is an accumulation point of {vk)k&M} ■ (26) 

The set 9JT of critical points of the optimisation problem from Eq. dJ]) is defined as follows: 

tut := {f E V : 3p £ P : V = U{p) A F'{p) = O} . (27) 

Proposition 3.9. The sequence of parameter (p^,fc)^g]Nd,fcGiN/^'^'^ ^TS algorithm is bounded. 

Proof. From the definition of / and Lemma [33] it follows that 

< f{vk,k) = ^ < ll&ll, 

i.e. the sequence (||i;^,fc||)/iG]Nd,fcGiN C Range {U) is bounded. The sequence (||n^,fc||)^g]Nrf,fceiN is the product 
of the following d sequences (||p^||)feGiN C According to Corollary 13.61 the sequences (||p^||)fcGiN are 
monotonically increasing. Since the product ||n^,fc|| is bounded and all sequences (||p^||)a:gin are monotonically 
increasing, it follows that all (p^)A:eiN are bounded. This means the sequence (p^,fc)^eiNd,fceiN is bounded. ■ 


The following statements are proofed in a corresponding article about the convergence of alternating least 
squares optimisation in general tensor format representations, please see ||5] for more informations regarding 
the proofs. 

Lemma 3.10 (JIl). We have 


max 

0<At<L-l 



-^ 0 . 

/c—>-oo 
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Corollary 3.11 (El). Let (p^)fcelN be the sequence from Algorithm\I\and F : P ^ M,from Eq. (|2]). We have 

lim F'{p )=Q. 

K —)-00 

Theorem 3.12 (El). Let (ffc)A:GiN be the sequence of represented tensors from the ALS method. Every accu¬ 
mulation point of {vk)k&m a critical point, i.e. A{vk) F 9JT. Eurther, we have 

dist {vk,Tl) -)• 0. 

fc^OO 

Let t; € be a critical point and N := 0^=1 ^ Further, let {p^ C -P be the sequence 

of parameter from the ALS algorithm and R G be a matrix with Rf^R = Idj^iv-i and span(i;)-*- = 

Range {R), i.e. the column vectors of R build an orthonormal basis of the linear space span(i;)-*-. Then the 
block matrix 

V:=[v R]eR^^^, (r; :=i;/||i;||). (28) 

is orthogonal, i.e. the columns of the matrix V build an orthonormal basis of the tensor space V. The following 
matrix Nk^/j, G is imported in order to describe the rate of convergence for the ALS method: 

A*—1 ^ ^ s d 

Nk,^ := (g) Id ® (g) Id, 

where the matrix ^from Corollary 12.41 Further, it follows from Corollary 12.41 that for 
the ALS micro step the following equation: 


r’fcjAi+i — bVk^fj,Vk,n 


(29) 


holds. The tensor Vk,/! and the matrix Nk^/j, are represented with respect to the basis V, i.e 


and 




VV^Vk,^, = [ V 


( V^Vk,fi \ 



Nk,tJ. 


V {V^Nk,f,V) V^=[v R 


V^Nk,^,V vf'Nk,^,R 1 r jy iT 

R^Nk,,v R^Nk,,R J [ ^ ^ ] 


The recursion formula (l29l ) leads to the recursion of the coefficient vector 


V ^k-\-l,fi 


H ^k,filL H ^k,fiR f ^k,fi \ f H ^k,,filL ^k,fL H“ H ^k,fiR ^k,fi \ 

R^Nk,ij,v R^Nk^^R \ V Sk,^, ) V Ck,kL + R^Nk,ixR Sk,ti ) 


Without loss of generality we can assume that A 0 ^rid |cfc_^| / 0. Therefore, the following terms are 

well defined: 

(s) ^k,fj, T R blk,fj,R 'SfcjAtll 

%,kL 


v^Nk,^v Ck,f, + v^Nk,i,R 
\^k,a\ 
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This preconsideration gives a recursion formula for the tangent of the angle between v and Vk,fi+i- We 
have 


tan^ Z[v,Vk,f,+i] 


\RR / 


11-^ vk,ti+i\\ _ llg/c.^+ilr 


a] 








Remark 3.13. Obviously, if the sequence of parameter (p^)fceiN C P is bounded, then the set of accumulation 
points o/(p^)fcg]N is not empty. Consequently, the set A{vk) is not empty, since the map U is continuous. 
Theorem 3.14 (|51). If one accumulation point v € A{vk) C is isolated, then we have 


Vk - V. 

/c—>-oo 

Furthermore, we have for the rate of convergence of an ALS micro step 


|tanZ[u,Ufc,^+i]| < |tanZ[u,Ufc,^]| , 


where 




lim sup 

/c—>-oo 


(-5) 


(c) 


UQfi ~ sequence (|tanZ[i;, converges Q- superlinearly. If < 1, then the sequence 

(|tanZ[i;,Ufc,^]|)^g]P^ converges at least Q- linearly. If q^ > 1, then the sequence (|tan Z[u, con¬ 

verges not Q-linearly. 

Remark 3.15. The calculation from Examole M.2\ shows that 


lim sup 

/c—>-oo 


(•®) 

(c) 


0 for all p € IN^. 


Hence, the ALS algorithm converges here Q-superlinearly. Furthermore, in ExamDle \I.3\ we showed for A < 


1 

2 


lim sup 

fc—)-oo 


(- 5 ) 

(c) 


A 

2 


(^3A + A^ + V(3A + A 2)2 + 4a) < 1 for all p € IN^. 


Hence, we have here Q-linear convergence. 

Corollary 3.16 (10). If the set of critical points 9)1 is discrete^ then the sequence of represented tensors 
{vk)keK from the ALS method is convergent. 

In the following example it will be shown, that the ordering of the indices may play an important role for the 
convergence of ALS procedure. 

Remark 3.17. Let b = 0^=^ bi^ + A 0^=^ 62 ^, with 0 < A < 1, ||5i^|| = || 62 ^|| = 1 and {bi^, 62 ^) = 0/or 
/r € ]N<rf. Let further = C 0^=i Pi for some C € R and 

= hr + (30) 

*In topology, a set which is made up only of isolated points is called discrete. 
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for some G IR. Assume after each ALS micro step the parameters p^ are rescaled to the form dJOl) (obviously, 
a scaling of parameters has no effect on the future behavior of the ALS method). After the first four micro steps 
one gets 

p\ = bn + Aa2a3^2i 
P 2 — bl2 + Oi'20i‘^b22 

pI = bn + A'^a2«3^23 

Pi = bn + A'^a2'^362i 

So for v\ '■= p\®p\®p\ one gets 

vl = C{hn + A^a 2 « 3 ^ 2 i) ® (^13 + >?a2alh2z) ® (612 + A'^a2ai622) 

with some C € R. Now assume the order of the directions for ALS optimization is changed from (1, 2,3) to 
(1,3, 2), i.e. after optimizing the first component p\ we optimize the third one (i.e. p^) and only then the second 
one (i.e. P 2 ). The same number of micro steps will result in a tensor 

vl = C{bn + A^a2ai62i) ® (bn + A^aiai623) <8) (612 + A^a2«3^22) 
with some (5 G R. Now if a 2 and satisfy 


a2 > 1 > 

« 2«3 - - " 2 « 3 > 

then it is not difficult to check, that vf satisfies the dominance condition from Eq. ([71) for j = 1, whereas vf 
satisfies the dominance condition for j = 2. Thus, with the same starting point ALS iteration will converge 
to the global minimum bi^ for one ordering of the indices and to local minimum A (^^=1 b 2 ^ for another 

ordering. Note that vq did not fulfil the dominance conditions, but depending on the ordering of the ALS micro 
steps Vq leads to a dominance condition for different terms. 


4 Numerical Experiments 


In this subsection, we observe the convergence behavior of the ALS method by using data from interesting 
examples and more importantly from real applications. In all cases, we focus particularly on the convergence 
rate. 


4.1 Example 1 

We consider an example introduced by Mohlenkamp in Section 4.3.5]. Here we have 


b 




ei:= 


— 

61 : = 



0 

1 


see Eq. ([B- The tensor b is orthogonally decomposable. Although the example is rather simple, it is of 
theoretical interest. Since the ALS method converges superlinear, cf. the discussion in Section [T] The tensor 
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b has only two terms, therefore the upper bound for convergence rate from Eq. ([ 8 ]l is sharp, cf. Eq. (|9ll. Eet 
r > 0, we define the initial guess of the AES algorithm by 



we have for r < ^ that the initial guess vo{t) dominates at 62 . Therefore, the AES iteration converge to 62 . If 
, then uo(r) dominates at 61 and the sequence from the AES method will converges to bi. In the first test 
the tangents of the angle between the current iteration point and the corresponding parameter of the dominate 
term bi (1 < I <2) is plotted, i.e. 


tan tpk,i 


' 1 - COS^ i^k,l 
cos^ ipk^l 


(31) 


where cos (pk,i = 

plots for the quotient 


To illustrate the superlinear convergence of the AES method, we present further 


tany?fc+i,f 

tan 


(32) 


4.2 Example 2 

Most algorithms in ah initio electronic structure theory compute quantities in terms of one- and two-electron 
integrals. In |T| we considered the low-rank approximation of the two-electron integrals. In order to demon¬ 
strate the convergence of the AES method on an example of practical interest, we use the order 4 tensor for the 
two-electron integrals of the so called AO basis for the CH 4 molecule. We refer the reader to |T] for a detailed 
description our example. In this example the AES method converges Q-linearly, see Eigure|4l 


4.3 Example 3 


We consider the tensor 


3 

bx = p + X{piSiqi^q + qiSip®q + q'Siq‘^p) 
11=1 


from Ex. 0 The vectors p and q are arbitrarily generated orthogonal vectors with norm 1. The values of 

are plotted, where ip\ is the angle between p\ and the limit point p (i.e. tan = 7 %^, for k > 2 ). 

\Pk’P) 

Eor the case A = 0.5 the convergence is sublinearly, whereas for A = 0.2 it is Q-linearly. 
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(a) is plotted for r £ {0.5001, 0.505, 0.6}. Here the 
term bi dominates at every iteration point. 


3‘>| BMd|E 

(b) Ik ,2 is plotted for r £ {0.4999, 0.495, 0.4}. Here the 
term 62 dominates at every iteration point. 


Figure 3: qk,i from Eq. (l3^ is plotted for / G {1,2} and different values for r. 
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(a) The tangents tanipk,! for A = 0.2. 



Figure 5: The approximation of b from Example 11.31 is considered. The tangents of the angle between the 
current iteration point and the limit point with respect to the iteration number is plotted. For A = 1/2, we have 
sublinear convergence. But for A = 0.2 <1/2 the sequence converges Q-linearly. 
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